Phenome-wide Mendelian randomisation analysis of 378,142 cases reveals risk factors for eight common cancers

For many cancers there are only a few well-established risk factors. Here, we use summary data from genome-wide association studies (GWAS) in a Mendelian randomisation (MR) phenome-wide association study (PheWAS) to identify potentially causal relationships for over 3,000 traits. Our outcome datasets comprise 378,142 cases across breast, prostate, colorectal, lung, endometrial, oesophageal, renal, and ovarian cancers, as well as 485,715 controls. We complement this analysis by systematically mining the literature space for supporting evidence. In addition to providing supporting evidence for well-established risk factors (smoking, alcohol, obesity, lack of physical activity), we also find sex steroid hormones, plasma lipids, and telomere length as determinants of cancer risk. A number of the molecular factors we identify may prove to be potential biomarkers. Our analysis, which highlights aetiological similarities and differences in common cancers, should aid public health prevention strategies to reduce cancer burden. We provide a R/Shiny app to visualise findings.

Most MR studies of cancer have been predicated on assumptions about disease aetiology or have sought to evaluate purported associations from conventional observational epidemiology 3,4 .A recently proposed agnostic strategy, termed MR-PheWAS, integrates the phenome-wide association study (PheWAS) with MR methodology to identify potential causal relationships considering hitherto previously unexamined traits 5 .
To identify potentially causal relationships for eight common cancers: breast, prostate, colorectal (CRC), lung, endometrial, oesophageal, renal cell carcinoma (RCC), ovarian, and reveal intermediates of risk, we conducted a MR-PheWAS study utilising 378,142 cases and 485,715 controls.We integrated findings with a systematic mining of the literature space to provide supporting evidence and derive a more comprehensive description of disease aetiology (Fig. 1B) 6 .

Phenotypes and genetic instruments
After filtering we analysed 3661 traits, proxied by 336,191 genetic variants in conjunction with summary genetic data from published GWAS of breast, prostate, colorectal, lung, endometrial, oesophageal, renal, and ovarian cancers (Table 1; Supplementary Data 1).The number of single nucleotide polymorphisms (SNPs) used as genetic instruments for each trait ranged from one to 1335. Figure 2 shows the power of our MR study to identify potentially causal relationships between each of the genetically defined traits and each cancer type.The median proportion of variance explained (PVE) by SNPs used as IVs for each of the 3,661 traits evaluated as risk factors was 3.4% (0.01-84%).Our power to demonstrate relationships a priori for each cancer type reflects in part inevitably the size of respective GWAS datasets (Supplementary Data 2).

Causal associations predicted by MR
To aid interpretation, we grouped traits related to established cancer risk factors (i.e.smoking, obesity and alcohol) and those for which current evidence is inconclusive into the following categories, using a similar approach to Markozannes et al. 4 : cardiometabolic; dietary intake; anthropometrics; immune and inflammatory factors; fatty acid (FA) and lipoprotein metabolism; lifestyle, reproduction, education and behaviour; metabolomics and proteomics; miscellaneous.
Given the large number of traits being evaluated, we categorised the support for potentially causal relationships between non-binary traits and cancers into four hierarchical levels of statistical significance a priori: robust, probable, suggestive, and non-significant (Fig. 3; Methods).Out of the 27,066 graded associations, MR analyses provided robust evidence for a potentially causal relationship with 123 phenotypes (0.5% of total MR analyses), 174 with probable evidence (0.6% of total), 1652 with suggestive evidence (6% of total).Across the eight cancer types, the largest number of robust associations were observed for endometrial cancer with 37 robust associations, followed by RCC (n = 32), CRC (n = 21), lung (n = 20), breast (n = 10), oesophageal (n = 3) and prostate cancer (n = 1).No robust MR associations were observed for ovarian cancer (Supplementary Data 3).
Across all the cancer types, anthropometric traits showed the highest number of robust relationships (n = 32; 0.1%), followed by lifestyle, reproduction, education, and behaviour (n = 17; 0.06%).No robust associations were observed for dietary intake or cardiometabolic categories (Supplementary Data 3).
To visualise the strength and direction of effect of the relationship between each of the traits examined and risk of each cancer type and, where appropriate, their respective subtypes we provide a R/Shiny app (https://software.icr.ac.uk/app/mrcan).Figure 4 shows a screenshot of the app for selected traits across the eight different types of cancer.
In addition to a robust association between higher HLA-DR dendritic plasmacytoid levels and risk of prostate cancer (OR SD = 1.05, 95% CI: 1.03-1.06,P = 5.22 × 10 −10 ), 26 probable associations between genetically predicted levels of other circulating immune and inflammatory factors were shown across the cancers studied.These included higher levels of IL-18 with reduced risk of lung cancer (Probable, OR SD = 0.89, 95% CI: 0.83-0.96,P = 2.00 × 10 −3 ), with specificity for lung cancer in never smokers.For proteomic traits, we conducted a Bayesian colocalisation analysis to determine whether genetic variants influencing protein levels and cancer risk are shared by considering the strongest proteomic associations with a clear gene target and a cis-IV (i.e.within 1 Mb; Methods) with P-value < 1 × 10 −6 in the outcome

Oesophageal
Ovarian Prostate Renal cancer.We identified KDEL motif-containing protein 2 (KDELC2) and RCC, as well as Copine-1 (CPNE1) and Immunoglobulin superfamily containing leucine-rich repeat protein 2 (ISLR2) and breast cancer as having a high posterior probability of a shared variant (i.e.PP H4 > 0.8).
In contrast, Kunitz-type protease inhibitor 2 (SPINT2) and prostate cancer, as well as Semaphorin-3G (SEMA3G) and CRC, were shown to have distinct variants at the gene target (i.e.PP H3 > 0.8; Supplementary Data 6).Results for the IV at Histo-blood group ABO system transferase (ABO) with ovarian cancer were indeterminate (PP H4 = 0.67 and PP H3 = 0.33).
Our MR analysis provides support for a relationship between rectal polyps and CRC (β = 95.59,Standard Error (SE) = 4.99, P = 6.88 × 10 −82 ) 18 , benign breast disease and breast cancer 19 , and oesophageal reflux with risk of oesophageal cancer (β = 0.27, SE = 0.08, P = 1.30 × 10 −3 ) (Supplementary Data 7) 20 .Other associations included possible relationships between pulmonary fibrosis and lung cancer 21 , as well as the relationship between a diagnosis of schizophrenia and lung cancer (β = 0.10, SE = 0.04, P = 2.89 × 10 −2 ), which has been previously reported in conventional epidemiological studies 22 .It was noteworthy, however, that we did not find evidence to support the purported relationship between hypertension and risk of developing RCC 23 .Similarly, our analysis did not provide evidence to support a causal relationship between either type 1 or type 2 diabetes and an increased cancer risk.

Multivariable MR of biologically related traits
Selected traits within our analysis may show pleiotropic effects with other traits and work by Burgess et al. 24 has shown that MR can only assess the causal effect of a risk factor on an outcome by using genetic variants that are solely associated with the risk factor of interest.To address pleiotropy we performed multivariable MR (MVMR) as a form of mediation analysis focusing on known biologically related traits.Specifically, we examined the role of IGF-1 and height on breast and colorectal cancer risk 25 ; lipid traits on breast and colorectal cancer risk 26,27 ; and fasting insulin, sex hormone-binding globulin levels (SHBG), BMI and testosterone on endometrial cancer risk 28 (Supplementary Data 8).In the MVMR analysis of HDL-C, LDL-C and triglyceride levels, we found the relationship of increasing HDL cholesterol with breast cancer risk and increasing LDL-C with colorectal cancer risk remained significant in a model accounting for these biologically related traits (OR MVMR = 1.06,P MVMR = 0.03 and OR MVMR = 1.09,P MVMR = 0.04, respectively).Considering height and IGF-1 and their association with CRC risk and breast cancer risk, IGF-1 remained significantly associated with breast cancer risk (OR MVMR = 1.06,P MVMR = 0.049), while height remained significantly associated with colorectal cancer risk (OR MVMR = 1.06,P MVMR = 0.045).In contrast IGF-1 became non-significant (P = 0.16), which may suggest that the relationship between IGF-1 levels and CRC is mediated through the relationship with height.Finally, MVMR of fasting insulin, SHBG, BMI and testosterone and their effect on endometrial cancer, attenuated the significance of association (P > 0.5) of fasting insulin and bioavailable testosterone with the outcome, while SHBG and BMI remained significant, but with a modest decrease in effect size (OR MVMR = 0.61, P MVMR = 0.02 and OR MVMR = 1.65,P MVMR = 6.37 × 10 −5).Hence this suggests that bioavailable testosterone and fasting insulin do not have an independent effect on endometrial cancer risk and the associations are likely to be mediated, at least in part, through SHBG and BMI.

Literature-mined support for MR defined relationships
To provide support for the associations and to gain molecular insights into the underlying biological basis of relationships we performed triangulation through systematic literature mining.We identified 55,105 literature triples across the eight different cancer types and 680,375 literature triples across the MR defined putative risk factors (Supplementary Data 9).Overlapping risk factor-cancer pairings from our MR analysis yielded on average 49 potential causal relationships.Potentially causal relationships between non-binary traits and cancers were categorised into four hierarchical levels of statistical significance a priori; robust (P IVW-RE < 1.4 × 10 −5 ; corresponding to a P-value of 0.05 after Bonferroni correction for multiple testing (0.05/3,500), P WME or P MBE < 0.05, predicted true causal direction and >1 IVs), probable (P IVW-RE < 0.05, P WME or P MBE < 0.05, predicted true causal direction and >1 IVs), suggestive (P IVW-RE < 0.05 or P WALD < 0.05), and non-significant (P IVW-RE ≥ 0.05 or P WALD ≥ 0.05).
Weighted median estimates (WME) 52 and mode-based estimates (MBE) 53 were used in addition to an inverse weighted random effects (IVW-RE) model, to assess the robustness of our findings, while MR-Egger regression assessed the extent to which directional pleiotropy could affect causal estimates 54 .MR-Steiger was used to ascertain that the exposure trait influenced the outcome and not vice versa 55 .Binary traits were classified associations as being supported (P < 0.05) or not supported (P > 0.05).MR, Mendelian randomisation; IV, instrumental variable.Supplementary Data 10 stratifies the literature space size by trait category while recognising that identified relationships with a small literature space could be reflective of deficiencies in semantic mapping relationships with large literature spaces supporting triangulation.Supplementary Data 11 provides the complete list of potential mediators for each trait.Illustrating the use of triangulation using a large literature space (defined herein as >50 triples) to support potentially causal relationships, Fig. 5

highlights four notable examples (IGF-1, LAG-3, IL-18, and PRDX1).
IGF-1, which is reported to play a role in multiple cancers, appears to mediate its effect in part through beta-catenin and BRAF signalling, modulating CRC and breast cancer risk 29 .Whilst LAG-3 inhibition is an attractive therapeutic target in restoring T-cell function, we demonstrate genetically elevated LAG-3 levels as being associated with reduced CRC, endometrial and lung cancer.In all three of these cancers, the association appears to be at least partly mediated through IL-10.The seemingly paradoxical relationship between LAG-3 levels and tumourgenesis may reflect potentiation of T-cell function by serum LAG-3 rather than cell membrane expressed LAG-3 30 .We identify genetically predicted IL-18 levels as being associated with an increased risk of lung cancer.Our literature mining also supports a role for the decoy inhibitory protein, IL-18BP as being a mediator of lung cancer risk as well as IL-10, IL-12, IL-4 and TNF 31 .Finally, PRDX1, a member of the peroxiredoxin family of antioxidant enzymes, interacts with the androgen receptor to enhance its transactivation resulting in increased EGFR-mediated signalling and an increased prostate cancer risk 32 .

Discussion
By performing a MR-PheWAS we have been able to agnostically examine the relationship between multiple traits and the risk of eight different cancer types, restricted only by the availability of suitable genetic instruments.Importantly, many of the traits we examined have not previously been the subject of conventional epidemiological studies or been assessed by MR.Comparing our work with a recent systematic review of the previously published MR studies of cancer, less than 10% of the MR exposures in this study had been the subject of previous investigations 4 .In addition, 85% of those traits which we found were significant had not previously been examined.Even for risk factors that were examined in many previous analyses, the number of cases and controls in our study has afforded greater power to identify potential causal associations.This has allowed us to exclude large effects on cancer risk for most exposure traits examined.
In addition to predicting causal relationships for the wellestablished lifestyle traits, which validates our approach, we implicate other lifestyle factors that have been putatively associated by observational epidemiology contributing to cancer risk.For example, the protective effects of physical activity (Suggestive) with lung cancer risk, oily fish (Probable) for CRC risk and fresh/dried fruit intake (Probable) for breast cancer risk.Several of the potentially causal relationships we identify have been the subject of studies of individual traits and include the association between longer LTL with increased risk of RCC and lung cancers (Robust); sex steroid hormones and risk of breast and endometrial cancer and circulating lipids with CRC and breast cancer.Clustering of MR predicted causal effect sizes for each trait cancer relationship highlights the importance of risk factors common to many cancers but also reveal differences in their impact in part likely to be reflective of underlying biology (Fig. 6).
Using genetic instruments for plasma proteome constituents has allowed us to identify hitherto unexplored potential risk factors Fig. 4 | Bubble plot of the potentially causal relationship between selected traits and risk of different cancers.The columns correspondto differentcancer types.The colours on the heatmap correspond to the strength of associations (odds ratio) and their direction (red positively correlated, blue negatively correlated).P-values represent the results from two-sided tests and are unadjusted.The size of each node corresponds to the -log 10 P-value, with increasing size indicating a smaller P-value.In the available R/Shiny app (https://software.icr.ac.uk/app/mrcan), moving the cursor on top of each bubble will reveal the underlying MR statistics.
for a number of the cancers, including: the cytokine like molecule, FAM3D, which plays a role in host defence against inflammation associated carcinogenesis with lung cancer 33 ; the autophagy associated cytokine cardiotrophin-1 with lung (Probable), endometrial (Suggestive), prostate (Suggestive) and breast (Suggestive) cancer and the tumour progression associated antigen CD63 with endometrial cancer 34,35 .Levels of these and other plasma proteins potentially represent biomarkers worthy of future prospective studies.Furthermore, for proteomic traits with cis-IVs previous work has found that an MR association with colocalization evidence is associated with a higher likelihood of a particular target-indication pair being successful in drug discovery 36 .
A principal assumption in MR is that variants used as IVs are associated with the exposure trait under investigation.We therefore used SNPs associated with exposure traits at genome-wide significance.Furthermore, only IVs from European populations were used to limit bias from population stratification.Our MR analysis does, however, have limitations.Firstly, we were limited to studying phenotypes with genetic instruments available, moreover traits such as food intake or television watching can be highly correlated with other exposures making deconvolution of the causal risk factor problematic [37][38][39] .While MVMR can be used to account for the correlation between traits, calculation of conditional F-statistics for dietary traits yielded weak instruments (F < 3), which precludes their inclusion in an MVMR model due to weak instrument bias.Secondly, correcting for multiple testing guards against false positives especially when based on a single exposure outcome.However, the potential for false negatives is not unsubstantial.Since we have not adjusted for between trait correlations, our associations are inevitably conservative.Thirdly, for several traits, we had limited power to demonstrate associations of small effect.Fourthly, not unique to our MR analysis, is the inability of our study to deconvolute time-varying effects of genetic variants as evidenced by the relationship between obesity and breast cancer risk 40 .Finally, as with all MR studies, excluding pleiotropic IVs is challenging.To address this, we incorporated information from weighted median and mode-based estimate methods, to classify the strength of potentially causal associations.For groups of traits susceptible to pleiotropy (e.g., lipids) we also demonstrated how their incorporation into a MVMR model can affect the relationship between these traits and outcome.There are inevitably limitations to such modelling as exemplified by the strong relationship between plasma FA and risk of CRC which has been shown to be driven by the pleiotropic FADS locus which has a profound effect on the metabolism of multiple FA through its gene expression 17 .
A major concern articulated regarding any MR-PheWAS is the need to provide supporting evidence from alternative sources.Herein we have sought to address this by conducting a systematic interrogation of the literature space and potentially identify intermediates to explain relationships.Furthermore, we performed MVMR to deconvolute relationships where multiple traits appear to influence cancer risk.Although literature mined data can be noisy and driven by publication bias, we have been able to provide a narrative of the potentially causal relationships for several risk factors, which are attractive candidates for molecular validation.
While complementary studies are required to delineate the exact biological mechanisms underpinning associations, our analysis does however highlight important targets for primary prevention of cancer in the population.The limited power to robustly characterise relationships between some exposure traits and cancer in this study, provides an impetus for larger MR studies.Finally, we recognise that MR is not infallible and replication and triangulation of findings using different data sources, and if possible, benchmarking against RCTs is highly desirable.Such efforts could identify additional factors as targets to reduce the overall burden of cancer.

Study design
Our study had four elements.Firstly, the identification of genetic variants serving as instruments for exposure traits under investigation; secondly, the acquisition of GWAS data for the eight cancers; thirdly, MR analysis; fourthly, triangulation through literature mining to provide supporting evidence for potential causal relationships (Fig. 1B).

Genetic variants serving as instruments
SNPs considered genetic instruments, were identified from published studies or MR-Base (Supplementary Data 2).For each SNP, the corresponding effect estimate on a trait expressed in per standard deviation (SD) units (assuming a per allele effect) and standard error (SE) was obtained.Only SNPs with a minor allele frequency >0.01 and a trait association of P-values < 5 × 10 −8 in a European population GWAS were considered as instruments.We excluded correlated SNPs at a linkage disequilibrium threshold of r 2 > 0.01, retaining SNPs with the strongest effect.For binary traits we restricted our analyses to traits with a medical diagnosis, excluding cancer.We removed duplicate exposure traits based on manual curation.

Statistical analysis
For each SNP, effects were estimated for cancer as an odds ratio (OR) per SD unit increase in the putative risk factor (ORSD), with 95% confidence intervals (CIs), using the Wald ratio 50 .For traits with multiple SNPs as IVs, causal effects were estimated under an inverse variance weighted random-effects (IVW-RE) model as the primary measurement as it is robust in the presence of pleiotropic effects, provided any heterogeneity is balanced at mean zero (Supplementary Data 3, 13-15) 51 .Weighted median estimate (WME) and mode-based estimates (MBE) were obtained to assess the robustness of findings (Supplementary Data 16) 52,53 .Directional pleiotropy was assessed using MR-Egger regression (Supplementary Data 17) 54 .The MR Steiger test was used to infer the direction of potentially causal effect for continuous exposure traits (Supplementary Data 18) 55 .For this we estimated the PVE using Cancer Research UK lifetime risk estimates for each tumour type (Supplementary Data 19).A leave-one-out strategy under the IVW-RE model was employed to assess the potential impact of outlying and pleiotropic SNPs (Supplementary Data 4) 56 .This sensitivity analysis tests the effect of performing MR on the IVs leaving one SNP out in turn.It can be used to identify when one SNP is driving the association as, when this SNP is removed, we can expect to see an attenuation of the MR association significance.Because two-sample MR of a binary risk factor and a binary outcome can be biased, we primarily considered whether there exists a significant non-zero effect, and only report ORs for consistency 57 .For proteomic traits which had an IV located cis (+/-1 Mb) of the gene target we performed colocalisation using coloc 58 .This enumerates the four possible configurations of causal variants for two traits, calculating support for each model based on a Bayes factor.Adopting prior probabilities of p 1 , p 2 = 1 × 10 −4 and p 12 = 1 × 10 −5 , a posterior probability ≥0.80 was considered as supporting a specific model.For analyses of selected traits using MVMR we used the mv_multiple function in the TwoSampleMR package.MVMR was applied to investigate which of these traits within the same category had independent pleiotropic effects on a specific cancer.We restricted our MVMR analyses to traits which had ≥2 IVs and for which we had access to full summary statistics required for the analysis.Statistical analyses were performed using the TwoSampleMR package v0.5.6 (https://github.com/MRCIEU/TwoSampleMR)and MendelianRandomization package in R (v3.4.0) 49 .

Estimation of study power
The power of MR to predict a causal relationship depends on the PVE by the instrument 59 .We excluded instruments with a F-statistic <10 since these are considered indicative of evidence for weak instrument bias 60 .We calculated conditional F-statistics for the traits using the condFstat function in the MendelianRandomzation package 61 (Supplementary Data 20).We estimated the genetic correlation between traits using Linkage-Disequilibrium Adjusted Kinships (LDAK) software (Supplementary Data 21).We derived LD matrices for the genetic variants using the ld_matrix function in TwoSampleMR.We estimated study power, stipulating a P-value of 0.05 for each target a priori across a range of effect sizes as per Brion et al. (Supplementary Data 2) 62 .Since power estimates for binary exposure traits and binary outcomes in a two-sample setting are unreliable, we did not estimate study power for binary traits 57 .

Assignment of statistical significance
The support for a causal relationship with non-binary traits was categorised into four hierarchical levels of statistical significance a priori: robust (P IVW-RE < 1.4 × 10 −5 ; corresponding to a P-value of 0.05 after Bonferroni correction for multiple testing (0.05/3,500), P WME or P MBE < 0.05, predicted true causal direction and >1 IVs), probable (P IVW-RE < 0.05, P WME or P MBE < 0.05, predicted true causal direction and >1 IVs), suggestive (P IVW-RE < 0.05 or P WALD < 0.05), and nonsignificant (P IVW-RE ≥ 0.05 or P WALD ≥ 0.05) (Supplementary Data 22).Robust associations are those that remain significant after correcting for multiple testing, the predicted direction of the effect is predicted to be from the exposure to the cancer risk and multiple MR methods report a significant association.We consider these associations to have the strongest statistical evidence, by virtue of the concordance between various MR methods and statistical validation tests.Probable associations are those that do not remain significant after correcting for multiple testing, but the remaining conditions are the same as for robust traits.We include this classification to account for the large number of traits tested in this analysis, noting that when taken in isolation these traits may be reported as having potentially causal associations with cancer.Suggestive traits are those in which show significance P < 0.05, but where one of the following conditions are flouted: the direction of effect may not be predicted to be from exposure to cancer outcome, or there is no significant consensus between the multiple MR methods.Additionally, significant associations for which only one SNP could be used as an IV are classified as suggestive.This was chosen to reflect the potential uncertainties that arise when performing MR using a Wald ratio test with a single IV.Finally, all other traits are classified as non-significant, indicating that it is unlikely that there is any potentially causal association.While nonsignificant associations can be due to low statistical power, they also indicate that a moderate causal effect is unlikely.For binary traits we classified associations as being supported (P < 0.05) or not supported (P > 0.05; Supplementary Data 6, 23-25).

Support for causality
To strengthen evidence for causal relationships predicted from the MR analysis we exploited the semantic predications in Semantic MEDLINE Database (SemMedDB), which is based on all PubMed citations 63 .Within SemMedDB pairs of terms connected by a predicate which are collectively known as 'literature triples' (i.e.'subject term 1'predicates -'object term 2').These literature triples represent semantic relationships between biological entities derived from published literature.To interrogate SemMedDB we queried MELODI Presto and EpiGraphDB to facilitate data mining of epidemiological relationships for molecular and lifestyle traits [64][65][66] .For each putative risk factorcancer pair the set of triples were overlapped, and common terms identified to reveal potentially causal pathways and inform aetiology.Based on the information profile of all literature mined triples, we considered literature spaces with >50 literature triples as being viable, corresponding to 90% of the information content 67 .We complemented this systematic text mining by referencing reports from the World Cancer Research Fund/American Institute for Cancer Research, and the International Agency for Cancer Research Global Cancer Observatory, as well as querying specific putative relationships in PubMed 7 .

Fig. 1 |
Fig. 1 | Principles of Mendelian randomisation (MR) and study overview.A Assumptions in MR that need to be satisfied to derive unbiased causal effect estimates.Dashed lines represent direct causal and potential pleiotropic effects that would violate MR assumptions.A, indicates genetic variants used as IVs are strongly associated with the trait; B, indicates genetic variants only influence cancer risk through the trait; C, indicates genetic variants are not associated with any measured or unmeasured confounders of the trait-cancer relationship.SNP, single-nucleotide polymorphism; B Study overview.Genetic variants serving as instruments for exposure traits under investigation were identified from MRBase or PubMed.GWAS data for the eight cancers was acquired and MR analysis was performed.Results were triangulated through literature mining to provide supporting evidence for potentially causal relationships.Created with BioRender.com.GWAS genome-wide association study.

1.0 1 Fig. 2 |
Fig. 2 | Power to predict causal relationships in the Mendelian randomisation analysis across the eight different cancers.Each line represents an individual trait with the line colour indicating the F-statistic, a measure of instrument strength.Theanalysis of most traits is well powered across a modest range of odds ratios.Generally, better powered traits are those with a higher F-statistic.F-stat: F-statistic.

Fig. 5 |
Fig. 5 | Sankey diagram of literature spaces for exemplar cancer risk factors.These diagrams illustrate the relationship between exposure traits and cancers via their linked literature triples.The thickness of the line connecting two mediating traits indicates the frequency with which that triple is mentioned in the literature.Relationships for: A IGF-1 and colorectal cancer; B IL-18 and lung cancer; C LAG−3 and endometrial cancer; D PRDX1 and prostate cancer.AR androgen receptor, EGF: epidermal growth factor, EGFR epidermal growth factor receptor, ESRK extracellular signal regulated kinases, GMCSF granulocyte-macrophage colonystimulating factor, HACII histocompatibility antigens class II, IFNG interferon gamma, MM matrix metalloproteinases, MM9 matrix metalloproteinase 9, PHRP parathyroid hormone-related protein, PMH phosphoric monoester hydrolases, PPT: phenylpyruvate tautomerase, PR progesterone receptor, RIG recombinant interferon-gamma, TF transcription factor, TNF tumour necrosis factor, TSG tumour suppressor genes, VEGFA vascular endothelial growth factor A.

Fig. 6 |
Fig. 6 | Heatmap and dendrogram showing clustering of potentially causal associations between traits and cancer risk.Heatmap based on Z-statistics using the clustering method implemented in the pheatmap function within R. Colours correspond to the strength of associations and their direction (red positive association with risk, blue inverse association with risk).Trait classes are annotated on the left.Only traits showing an association for at least one cancer type are shown.Further heatmaps for individual classes of traits are shown in Supplementary Figs.1-8.

Table 1 |
Details of cancer genome-wide association studies used in the Mendelian randomisation analysis ). Single nucleotide polymorphisms were harmonised to ensure that the effect estimates of SNPs on